Data visualization best practices

GEOG 30323

March 17, 2020

Data visualization

  • Thus far: we’ve learned how to use data visualization to explore our data
  • In the weeks to come:
    • Best practices in data visualization
    • Advanced chart types
    • Interactive visualization
    • Geographic visualization (maps!)
    • Putting it all together!
Source: Wikimedia Commons
Source: Wikimedia Commons
Source: Nathan Yau/FlowingData

Anscombe’s Quartet

Source: Wikimedia Commons

Considerations when visualizing data

  • What are you visualizing?
  • Who is your audience?
  • In what format will you be presenting the visualization?

Visual variables

Source: Data Points

Color

  • Hue: color, commonly understood (red, blue, green)
  • Lightness or Value: extent to which color is light or dark
  • Saturation: vividness of the color

Color schemes

Source: Data Points

Color and context

Source: FiveThirtyEight.com

Color-blindness

SBNation.com

Good use of color

Source: Kirk Goldsberry/Grantland

Poor use of color

Source: Jonathan Cohn via Kenneth Field/Cartonerd

Color and visual variables

Examples

Let’s fetch some data:

The ‘heat map’

Source: The Wall Street Journal

Heat maps in seaborn

  • Available in seaborn’s heatmap() function; takes a wide data frame with x-values in the index and y-values as column headers

The seaborn ‘heat map’

Color palettes in seaborn

  • ColorBrewer: popular color schemes for visualization
  • Support for ColorBrewer built into seaborn
  • See more at http://colorbrewer2.org/

Color in seaborn

  • Color palettes, available in the color_palette() function, can be viewed with the palplot() function, and reversed by adding _r

Color in seaborn

  • color_palette() also allows for the creation of custom palettes!

Color in seaborn

Highlighting and annotation

Source: Data Points

The “spaghetti” chart

Highlighting

Highlighting code setup

Highlighting code

Annotation in Python

Annotation code

Small multiples

Source: Data Points

Small multiples in Python

Small multiples in Python

Modifying chart options

  • seaborn is a wrapper around matplotlib, the main plotting engine for Python
  • In turn, all matplotlib customization methods are available for your seaborn plots - and there are many!
  • To get access: import matplotlib.pyplot as plt

Formatting axes & labels

  • Example:

Modified heatmap

seaborn and matplotlib

  • seaborn returns a matplotlib object that can be modified by the options in the pyplot module
  • Often, these options are wrapped by seaborn and available as arguments - so check the documentation to see what you can do!

Scatterplot smoothing

  • Local regression or LOESS used to produce smooth curves through data

Scatterplot matrices

Image resolution

  • Higher resolution: greater detail in an image
  • Commonly: dpi (dots per inch)
Source: Wikimedia Commons

Exporting your visualizations

  • To save your visualizations from the Jupyter Notebook: